Summary

We will use PyMC3 to estimate the posterior PDF for the true rating of a set of artificial teams using data from a simulated season. The idea is to test our model on a small set of artificial data where we know the answer to begin with, so we can learn about MCMC and make sure our model is sensible.


In [1]:
import pandas as pd
import os
import numpy as np
import pymc3 as pm
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
true_rating = {
    'All Stars': 2.0,
    'Average': 0.0,
    'Just Having Fun': -1.2,
}

true_index = {
    0: 'All Stars',
    1: 'Average',
    2: 'Just Having Fun',
}

n_teams = len(true_rating)
team_numbers = range(n_teams)
team_names = [true_index[i] for i in team_numbers]

In [3]:
true_rating


Out[3]:
{'All Stars': 2.0, 'Average': 0.0, 'Just Having Fun': -1.2}

In [4]:
team_names


Out[4]:
['All Stars', 'Average', 'Just Having Fun']

We have two teams, one of which is much better than the other. Let's make a simulated season between these teams.


In [5]:
season_length = [5, 20, 100]
traces = []
simulatedSeasons = []
for n_games in season_length:
    games = range(n_games)
    database = []
    for game in games:
        game_row = {}
        matchup = np.random.choice(team_numbers, size=2, replace=False)
        team0 = true_index[matchup[0]]
        team1 = true_index[matchup[1]]
        game_row['Team A'] = team0
        game_row['Team B'] = team1
        game_row['Index A'] = matchup[0]
        game_row['Index B'] = matchup[1]

        deltaRating = true_rating[team0] - true_rating[team1]

        p = 1 / (1 + np.exp(-deltaRating))

        randomNumber = np.random.random()

        outcome_A = p > randomNumber
        game_row['Team A Wins'] = outcome_A

        database.append(game_row)
    
    simulatedSeason = pd.DataFrame(database)
    simulatedSeasons.append(simulatedSeason)
    
    with pm.Model() as model:
        rating = pm.Normal('rating', mu=0, sd=1, shape=n_teams)

        deltaRating = rating[simulatedSeason['Index A'].values] - rating[simulatedSeason['Index B'].values]

        p = 1 / (1 + np.exp(-deltaRating))

        win = pm.Bernoulli('win', p, observed=simulatedSeason['Team A Wins'].values)
        
        trace = pm.sample(1000)
        
        traces.append(trace)


Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [rating]
100%|██████████| 1500/1500 [00:01<00:00, 1121.17it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [rating]
100%|██████████| 1500/1500 [00:01<00:00, 1314.11it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [rating]
100%|██████████| 1500/1500 [00:01<00:00, 1029.69it/s]

In [136]:
simulatedSeasons[1].groupby('Team A').sum()


Out[136]:
Index A Index B Team A Wins
Team A
All Stars 0 13 7.0
Average 4 6 2.0
Just Having Fun 16 6 1.0

In [135]:
1 / (1 + np.exp(-2))


Out[135]:
0.88079707797788231

In [6]:
sns.set_context('poster')
f, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 15))
# plt.figure(figsize=(10, 5))
for ax_index, n_games in enumerate(season_length):
    ax = axes[ax_index]
    for team_number in team_numbers:
        rating_posterior = traces[ax_index]['rating'][:, team_number]
        team_name = true_index[team_number]
        sns.distplot(rating_posterior, label=team_name, ax=ax)
    ax.legend()
    ax.set_xlabel('Rating')
    ax.set_ylabel('Density')
    ax.set_title("Season length: {} games".format(n_games))
plt.tight_layout()



In [97]:
simulatedSeason = pd.DataFrame(database)

In [98]:
simulatedSeason


Out[98]:
Index A Index B Team A Team A Wins Team B
0 1 0 Just Having Fun False All Stars
1 0 1 All Stars True Just Having Fun
2 1 0 Just Having Fun False All Stars
3 0 1 All Stars True Just Having Fun
4 1 0 Just Having Fun False All Stars
5 1 0 Just Having Fun False All Stars
6 0 1 All Stars True Just Having Fun
7 1 0 Just Having Fun True All Stars
8 0 1 All Stars True Just Having Fun
9 1 0 Just Having Fun False All Stars

In [75]:
project_dir = '/Users/rbussman/Projects/BUDA/buda-ratings'
scores_dir = os.path.join(project_dir, 'data', 'raw', 'game_scores')
simulatedSeason.to_csv(os.path.join(scores_dir, 'artificial_scores_big.csv'))

Prior on each team is a normal distribution with mean of 0 and standard deviation of 1.


In [99]:
simulatedSeason.shape


Out[99]:
(10, 5)

In [100]:
with pm.Model() as model:
    rating = pm.Normal('rating', mu=0, sd=1, shape=n_teams)

    deltaRating = rating[simulatedSeason['Index A'].values] - rating[simulatedSeason['Index B'].values]
       
    p = 1 / (1 + np.exp(-deltaRating))
    
    win = pm.Bernoulli('win', p, observed=simulatedSeason['Team A Wins'].values)

In [101]:
with model:
    trace = pm.sample(1000)


Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [rating]
100%|██████████| 1500/1500 [00:00<00:00, 1586.93it/s]

In [113]:
sns.set_context('poster')
plt.figure(figsize=(10, 5))
for team_number in team_numbers:
    rating_posterior = trace['rating'][:, team_number]
    team_name = true_index[team_number]
    sns.distplot(rating_posterior, label=team_name)
plt.legend()
plt.xlabel('Rating')
plt.ylabel('Density')


Out[113]:
<matplotlib.text.Text at 0x1c1ce0ce90>

Hmm, something looks odd here. The posterior pdf for these two teams has significant overlap. Does this mean that our model is not sure about which team is better?


In [114]:
sns.set_context('poster')
plt.figure(figsize=(10, 5))
for team_number in team_numbers[:-1]:
    rating_posterior = trace['rating'][:, team_number] - trace['rating'][:, -1]
    team_name = true_index[team_number]
    sns.distplot(rating_posterior, label="{} - {}".format(team_name, true_index[team_numbers[-1]]))
plt.legend()
plt.xlabel('Rating')
plt.ylabel('Density')


Out[114]:
<matplotlib.text.Text at 0x1c1d178f90>

In [119]:
gt0 = rating_posterior > 0
print("Percentage of samples where 'All Stars' have a higher rating than 'Just Having Fun': {:.2f}%".format(
    100. * rating_posterior[gt0].size / rating_posterior.size))


Percentage of samples where 'All Stars' have a higher rating than 'Just Having Fun': 99.22%

Ah, so the posterior pdf is actually quite clear: There is a 99.22% chance that "All Stars" are better than "Just Having Fun".

How does our confidence change as a function of the number of games in the season?


In [104]:
rating_posterior


Out[104]:
array([ 1.35253041,  2.20214716,  2.55056653, ...,  2.2638182 ,
       -0.15597247,  0.4021902 ])

In [88]:
.75 ** 14


Out[88]:
0.017817948013544083

In [82]:
estimatedratings = trace['rating'].mean(axis=0)

In [83]:
estimatedratings


Out[83]:
array([ 1.50334729,  0.49968655, -0.73728304, -1.23310912])

In [47]:
for key in true_rating:
    print("True: {:.2f}; Estimated: {:.2f}".format((true_rating[key], estimatedratings[key]))


  File "<ipython-input-47-538968eb2796>", line 2
    print("True: {:.2f}; Estimated: {:.2f}".format((true_rating[key], estimatedratings[key]))
                                                                                             ^
SyntaxError: invalid syntax

In [22]:
key


Out[22]:
2

In [ ]: